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ABSTRACT 

The purpose of this study was to determine whether 
students' responses to the Ullman student evaluation of teaching, the 
Student Evaluation of Teaching (SET) instrument, were biased by their 
achievement. The instrument was administered to all 325 Students 
taking a second course in calculus with economic applications, near 
the end of the course; and separate evaluation ratings were obtained 
for the course, the teachers, and the examinations. An achievement 
rating (average midterm score) was also obtained for each student, 
and correlations between this rating and the SET ratings were 
investigated for each section of the class. Also, a one-way analysis 
of variance was run to investigate a possible relation between these 
ratings over the entire class. In most sections, the correlations 
between achievement and SET ratings were positive, but only 24 out of 
48 were significant at the 5 percent level. The analysis of variance 
investigation revealed no further relationships. The authors conclude 
that the SET instrument under consideration may give unbiased 
evaluations for one teacher and biased evaluations for another, and, 
as such, is not to be recommended for general use. (MM) 
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Francis Dwyer of Pennsylvania State University, in his exhaustive 
survey of the literature on teacher evaluations, notes that evaluation 
ratings are subjective and have many inherent limitations. Many 
teachers feel that students are not capable of separating their 
personal feelings concerning a teacher from their evaluation of his 
teaching. Consequently, it is believed that teachers feel student 
evaluations are biased toward "liking” the teacher if they are doing 
wen in the course and "disliking" the teacher if they are doing poor 
work in the course. Furthermore, such teachers often feel student 
evaluations are of no use (at least to them) and feel ’threatened" if 
the results were to be available to others for interpretation. However, 
we cannot overlook the increased desire of students to evaluate what 
they experience, whatever that may be. Therefore, we need to know more 
about how to interpret SET (student evaluation of teaching) results. 

In this paper we examine one aspect of this interpretation - the 
relationship between student evaluation and student achievement. 

The Problem Mathematics teachers in high school or college who teach 
req uir ed mathematics courses are particularly prone to feel threatened 
by student evaluations because many students view mathematics as 
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difficult, unpleasant or uninteresting. If it can fee shown (relative 
to a particular evaluation instrument) that students do give unbiased 
(with respect to course achievement) evaluations, then the classroom 
teacher could feel that valid student evaluations are beneficial and 
meaningful. It is the purpose of this paper to report on the results 
of a recent study concerning the usage of student evaluations of the 
mathematics teacher and of mathematics instruction. 

The problem is stated in the following question. Do students 
give unbiased evaluations of their mathematics teachers? It is not 
our purpose to investigate the validity of student evaluations. 

(Students could give their teacher an unbiased, but also invalid, 
evaluation rating. ) 

The Investigation A SET instrument developed by Dr. Robert W. Ullman, 
Director of the Office of Evaluation at The Ohio State University, was 
used in the experiment. His instrument has been used at 28 different 
universities. The instrument consists of 48 questions divided into 3 
categories - course, instructor and examinations. There is approxi ma tely 
a 5 O- 5 O split between positively phrased questions and negatively phrased 
questi on s. Students were given the choice of 4- responses - strongly 
agree, agree, disagree and strongly disagree. The instrument was 
given to all students enrolled in Mathematic s H7 near the end of the 
course in the spring quarter 19^9 > Mathematics 117 is the second course 
in a cal c uiu s with economic applications sequence for non-math and 
non-physical science majors offered at The Ohio State University. There 
were 1 6 individual sections of the course taught by thirteen different 
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teachers (advanced graduate students and instructors in the department). 

Each section instructor followed a common syllabus and all students 

took identical departmental examinations given in the evening. 

The students were asked to specify their average midterm score in 

addition to the other information requested on the u nma n SET instrument. 

The forms were processed through a Bigitek 100 optical scanning system 

(IBM card output). A computer program was written that assigned an 

evaluation rating (R) to each category (course, instructor and examinations) 

by the following formula: R - 1 r, ; N is the total number of questions 

i ; =l 1 

in the category and r^ » 3 if the response to the i question is 
"strongly positive," or = 2 if the response to the i question 
is ’'positive, 1 ' or r^ = 1 if the response to the i question is 

•UjU 

"negative," or r .« 0 is the response to the 1 question is 
"strongly negative. " ( "Strongly positive equals a "strongly agree" 

response to a "positive " question while "strongly positive" equals a 
"strongly disagree" response to a "negative" question. Similarly for 
"positive," "negative," and "strongly negative.") The ranges of evaluation 
ratings were 0-60 for the course, 0-66 for the teacher and O-lO for the 
examinations. 

The following data were gathered for each student at the conclusion 
of the experiment - student number, section number (1 through l6), course 
evaluation rating, instructor evaluation rating, examination evaluation 
rating and average midterm, score (range 0-100). Correlation statistics 
(Pearson) were developed for each section relating the achievement 
variable (average midterm score) with (l) the course evaluation rating 
variable, (2) the instructor evaluation rating variable and (3) the 
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examinations evaluation rating Variable, 

The following were the correlation hypotheses of the experiment 
relative to each section. 

(HgjpHj) : There is a signifiCcmt positive correlation between 

student evaluation ratings of course (teacher, examinations) and 
student achievement* 

These hypotheses were tested in the usual null fozmilatian, denoted 
by . 

: The correlation coefficient between the variables of 

course (teacher, examination) rating and achievement is zero. 

The graphs in Figure 1 contain correlation plots for two sample 



sections. In the first there is a significant correlation while in 
the second the correlation is not significant. 




Figure 1. Course rating and achievement correlation plots. 
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In addition , the students were identified as belonging to one of 
five classification groups defined as follows: 

Group 5 " average midterm score $0 or above (100 possible) , 

Group h - average midterm score between 79 and JO, 

Group 3 - average midterm score between 6j and 30, 

Group 2 - average midterm score between 59 and JO, 

Group 1 - average midterm score less than 60. 



A one-way analysis of variance model was applied with the course 

A 

(instructor, examination) rating as the dependent variable and an effect 
for classification (defined above) as the independent variable. Relative 
to each section, the following were the null hypotheses tested in the 
analysis of variance investigation. 

(li- ,Hg) : There is no effect for achievement rank on the course 

(teacher, examination) evaluation ratings. 

It is worth noting that if there is a significant non-linear relationship 
between the two variables, the analysis of variance model can detect 
differences in effects for achievement ranldng that a correlation 
analysis can not. For example, suppose the correlation plot looked 
like the first graph in Figure 2. A correlation analysis would indicate 
no (linear) relationship where there is indeed an interesting (and 
significant) relationship. The F test should indicate that some 
significant relationship exi st s . 

It should be noted that the F test analysis uses grouped data and 
consequently it is, in one sense , ■weaker than a correlation analysis. 

This means significant correlations win exist between the variables 
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Figure 2. Hypothetical Example. 

while the F test analysis indicates an insignificant relationship. 

All hypotheses were tested at the ,05 level of significance using 
two*tailGd tests. The basic results of the study are summarized in 
the following tables, (r is the observed correlation coefficient and 



F is the calculated F statistic in the analysis of variance investigation, ) 
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SABLE i . Course 


Rating 


Summary 








Section 

Number 


Sample 

N 


r 


Re j ect * 


F 


Reject 

H 4 


1. 


18 


.6558 


Yes 


7.5477 


Yes 


2* 


21 


.0881 


No 


0.8270 


No 


3- 


25 


.csss 


NO 


0.7388 


No 


4, 


20 


• 5184 


Yes 


1.8775 


No 


5* 


17 


.4388 


•No 


1.8855 


No 


6. 




.4685 


Yes 


3-5777 


Yes 


7- 


15 


.3718 


No 


1.8284 


No 


8. 


15 


• 5324 


Yes 


2.3282 


No 


9- 


22 


-.0013 


No 


O.316O 


No 


10. 


16 


.3660 


No 


1.0853 


No 


11. 


22 


.4588 


Yes 


I.2698 


No 


12. 


30 


.3787 


Yes 


1.4766 


No 


15. 


17 


.0031 


No 


O.8956 


No 


14, 


17 


.1813 


No 


1.2148 


No 


15. 


21 


.5743 


Yes 


1-9105 


No 


16. 


24 


.1754 


No 


1.0H7 


No 
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TABLE II. Teacher Eating Summary 



Section 

Number 


Sample 

N 


r 


Reject 

H 2 


F 


Red ect 

h 


1- 


18 


.5989 


Yes 


5.7424 


Yes 


2. 


21 


-.1225 


No 


0.8381 


No 


3. 


25 


-.0956 


No 


1.6780 


No 


4. 


20 


.3127 


No 


0.5119 


No 


5- 


17 


.3020 


No 


0.7033 


No 


6. 


25 


.3586 


No 


2.0142 


No 


7- 


15 


.5042 


No 


2.1399 


No 


8. 


15 


.3284 


No 


0.3465 


No 


9* 


22 


-.1013 


No 


1.7996 


No 


10. 


1 6 


-.0160 


No 


0. 2982 


No 


11. 


22 


.0801 


No 


0.1997 


No 


12. 


30 


.2142 


No 


I.O632 


No 


13- 


17 


.38OI 


No 


1.1310 


No 


14. 


17 


.3514 


No 


1.5377 


No 


15- 


21 


.4238 


No 


2.3657 


No 


16. 


24 


-.0848 


No 


2.2916 


No 
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TABLE III. Examination Eating Summary 



Section 

Number 


Sample 

N 


r 


Eeject 

H 3 


F 


Reject 

H 6 


1. 


18 


.6091 


Yes 


2.0520 


No 


2, 


21 


.3099 


No 


0.7124 


No 


3 . 


25 


.2455 


No 


1.2285 


No 


4 . 


20 


.5470 

e 


No 


1.5865 


No 


5 - 


17 


.66 45 


Yes 


4.2614 


Yes 


6 . 


25 


.4917 


Yes 


1.2300 


No 


7 - 


15 


.0722 


No 


1.3297 


No 


8 . 


15 


.5108 


No 


0.5188 


No 


9 - 


22 


.2111 


NO 


0.9740 


No 


10 . 


1 6 


• 5535 


Yes 


2.1796 


No 


n. 


22 


• 5321 


Yes 


3.1690 


No 


12 . 


30 


-.0235 


No 


0.1278 


No 


15 . 


17 


.5452 


Yes 


2.0609 


No 


14 . 


17 


• 3314 


No 


’ 1.7851 


No 


15 . 


21 


• 1991 


No 


0.3322 


No 


16. 


24 


.l 408 


No 


1.2421 


No 
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In Table IV means and standard deviation by section are noted for 
the variables in the correlation analysis* Table V contains the 
analysis of various data for the two sample sections shown in 
Figure 1* Complete data from the study is available from the authors 
by request. 

TABLE IV. Means (q,) and Standard Deviations (a) 





A chi e vement 
d cr 


Course 

u 


Rating 

CT 


Teacher Rating 


Examination 

a 


Rating 

a 


Section 


i 


74.3 


l4.8 


31.5 


11.3 


45.3 


10.6 


9.9 


3-9 


Section 


2 


73-6 


11.2 


28.9 


9-5 


49-5 


-5-J- 


10.1 


1-9 


Section 


3 


73.0 


11.3 


28.2 


7-1 


49, 7 .. 


3-6 


10.2 


2.2 


Section 


4 


74.7 


13.8 


35.2 


8.0 


47.6 


11-3 


11.2 


3.0 


Section 


5 


66.7 


17.4 


23.0 


8.5 


32.0 


8,6 


9.9 


_ g-9. 


Section 


6 


76.6 


11.8 


30.0 


8.7 


49,3 


7-2 


10.9 


2.1 


Section 


7 


59-4 


13.4 


24.4 


-7.6 


42.0 


-9.4 


8-7 . 


2,4 


Section 


8 


67.6 


12.3 


32.2 7.5 


43.1 


5,3 


11-3 


2.6 


Section 


9 


79.1 


8.7 


31.4 


6,7 


49,3 


—6*7 


11.0 


2.2 


Section 


10 


63-3 


13.5 


29.8 


6, 7 


47,1 


5-5 


10.4 


3-3 


Section 


11 


71.0 


15.3 


29.1 


8.4 


46.4 


8.4 


9.5 


3.1 


Section 


12 


65.0 


17.7 


31.0 


6- 0 


47.2 


- 6 . 9 . 


10,5 


3-0 


Section 13 


68,4 


12.8 


26.2 


7.9 


38.1 


12.7 


8.8 


2.^ 


Section 


14 


65.4 


20.0 


26,7 


6*1 


28.7 


8.4 


10.0 


2.2 


Section 15 


78 .O 


13.4 


33.7 


9.5 


52.5 


6.6 


11,8 


2.6 


Section 


16 


72.0 


13.1 


31.0 


6.6 


47.0 


5,: 6 


10.2 


3.0 
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Section k (Course) 

Number of Treatment Groups = 5 

Treatment Sajrple Mean Standard 



Group Size Deviation 



1 


6 


30.8333 


5.9805 


2 


0 


0 . 


0 . 


3 


3 


30.0000 


8.6602 


4 


8 


38.1250 


7.3180 


5 


3 


41.3333 


8.3267 


I 


Analysis of Variance Table 




Sum of 
Squares 


BF Mean 
Square 


F-Ratio 


Between 

Groups 

Within 

Groups 


375.8255 
842 . 3698 


4 94.2064 
15 50.1580 


1.5775 


Total 


1219.1935 


19 





Section 3 (Course) 
Number of Treatment Groups = 5 



Treatment Seuple Mean Standard 
Group Size Deviation 



1 


3 ^ 


30.0000 


2, 6457 


2 


8 


25.7500 


10.1524 


3 


4 


27.2500 


6.1847 


4 


8 


31.3750 


3.7009 


3 


2 


25*5000 


10.6066 



Analysis of Variance Table 



Sum of DF Mean F-Katio 
S guar e s s guar e 



Between 


136.4150 


4 


39.1037 .7388 


Groups 


Within 


1058.6251 


20 


52.9313 


Groups 


Total 


1215 . 0401 


24 





Conclusions and Recommendations It is apparent (after considering the 
results summarised in Tables I* II and III) that only in section 1 ms 
the correlation between evaluation rating and student achievement * ■" 

significant (the higher evaluation ratings were associated ™ith the 
"good” students while the lower ratings were assoicated with the 
"poor” students). In the fifteen remaining sections the correlation 
between evaluation rating and student achievement was statistically 
not significant. However * in eight of these sections the observed 

correlation coefficient was positive and the ,f no correlation” hOTOthesis 
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was near the rejection level. 

In almost one-half of the sections the correlations between course 
evaluation rating and student achievement and the correlation between 
examination evaluation rating and student u- achievement were 
significant. Again in both cases the better students gave the more 
favorable evaluation ratings. It is important to observe that the 
analysis of variance investigation indicated there ms no significant 
new (not previously implied by the correlation analysis) relationship 
between the SET ratings and student achievement scores. What do all 
these statistics mean to the mathematics teacher? 

We can conclude that some teachers (even those teaching required 
mathematics courses) might expect to receive unbiased evaluation 
ratings from their students. Also we can conclude that the ratings 
of some teachers will be biased* the favorable ratings coming from 
the good students and less favorable ratings coming from the poorer 
students. We assume that a low correlation is evidence of unbias. 
However* a low correlation could result from instrument insensitivity 
or from the small numbers (l5~30) involved in each section. We also 
assume the evaluation ratings obtained from the Ullman instrument are 
valid. It has been our experience that this is usually the case . We 
found that the SET ratings would rank most of our teachers in the same 
order as we would. However, some staff members that we know to be 
excellent teachers frem our frequent observations and other close 
contacts are sometimes rated unfavorably. We cannot recommend any 
evaluation instrument (including the Ullman instrument) nor can we 
recommend that administrators encourage student evaluations without a 
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careful investigation of p sychological and '’local" considerations. A 
word of* warning is in order* It is possible that inexperienced 
teachers mi ght direct their teaching activities toward developing 
"favorable" evaluation ratings. If SET results are made "public" 
(used in salary and promotion considerations), then even experienced 
teachers might also direct their teaching activities towards 
developing "favorable" ratings! Such activities need not he in 

the students* best interest and possibly would result in ineffective 

6 

teaching. We do recommend that an Instructor who uses any evaluation 
instrument perform a correlation or regression analysis on the 
variables (evaluation rating and student achievement) and then 
interpret the results accordingly. 
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